-
Notifications
You must be signed in to change notification settings - Fork 239
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: drain and volume detachment status conditions #1876
base: main
Are you sure you want to change the base?
Conversation
4bb4d97
to
fb3ac47
Compare
Pull Request Test Coverage Report for Build 13248558350Warning: This coverage report may be inaccurate.This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.
Details
💛 - Coveralls |
/assign @engedaam |
This PR has been inactive for 14 days. StaleBot will close this stale PR after 14 more days of inactivity. |
/remove-lifecycle stale |
b527992
to
21176e1
Compare
/hold |
/lgtm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: engedaam, jmdeal The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
d536a96
to
43949ef
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
}) | ||
if !nodeClaim.StatusConditions().Get(v1.ConditionTypeDrained).IsTrue() { | ||
stored := nodeClaim.DeepCopy() | ||
_ = nodeClaim.StatusConditions().SetTrue(v1.ConditionTypeDrained) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we use the modified check here as well? Would probably work just as well as the IsTrue check
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't because we knew it always would be and this limited the scope of stored
, I can go either way though. The alternative would be:
stored := nodeClaim.DeepCopy()
if nodeClaim.StatusConditions().SetTrue(v1.ConditionTypeDrained) {
// remaining logic
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really think it tangibly matters -- good either way, just thought it was odd that we didn't keep consistent style here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe throw a comment over this to explain why it's coded this way? I found myself guessing about it again as I was reading through it -- had to re-read the context to understand that this is because of the patch
// 404 = the nodeClaim no longer exists | ||
if errors.IsNotFound(err) { | ||
continue | ||
if volumesDetached { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way to cleanup some of this logic so that it's not so nested -- comments might also help too -- the fact that we fall-through when volumes are detached and are able to get to the bottom of the function (we don't requeue) is a bit confusing IMO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've messed around with a few different ways of structuring this, but this ends up being the most readable IMO. I will add some comments to call out the fall-through behavior. FWIW, this does clean up in my larger refactor PR.
43949ef
to
4ed938c
Compare
New changes are detected. LGTM label has been removed. |
4ed938c
to
015140d
Compare
f4ba729
to
233f43a
Compare
/hold cancel |
233f43a
to
1ddb121
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Close to me! Just a few small things! Nice work!
@@ -64,6 +64,7 @@ func IgnoreNodeClaimNotFoundError(err error) error { | |||
// DuplicateNodeClaimError is an error returned when multiple v1.NodeClaims are found matching the passed providerID | |||
type DuplicateNodeClaimError struct { | |||
ProviderID string | |||
NodeClaims []string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NodeClaims []string | |
NodeClaimNames []string |
@@ -60,6 +60,7 @@ func TestAPIs(t *testing.T) { | |||
} | |||
|
|||
var _ = BeforeSuite(func() { | |||
fakeClock = clock.NewFakeClock(time.Now()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you add the fakeClock here -- doesn't look like it is being used anywhere in this testing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's passed down to the terminator and was causing nil panics. We could probably scope it down and just pass it there, but I don't mind keeping it test global so it's obvious to use the fake clock passed to the terminator rather than a new one in any future tests that use it.
}) | ||
if !nodeClaim.StatusConditions().Get(v1.ConditionTypeDrained).IsTrue() { | ||
stored := nodeClaim.DeepCopy() | ||
_ = nodeClaim.StatusConditions().SetTrue(v1.ConditionTypeDrained) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe throw a comment over this to explain why it's coded this way? I found myself guessing about it again as I was reading through it -- had to re-read the context to understand that this is because of the patch
// 404 = the nodeClaim no longer exists | ||
if errors.IsNotFound(err) { | ||
continue | ||
if len(pendingVolumeAttachments) == 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given how large this volume attachment code is now, what do you think about moving this into a separate function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been resistant to that because I find the requeue logic to be significantly harder to follow as soon as you move it into a separate function - I had a similar issue when I tried to structure this as subreconcilers. I had been punting those architectural changes for my larger refactor PR. If you don't feel strongly about it I'd rather punt for that, since I think the longer reconcile function is a better than obfuscating the control flow, but let me know what you think.
if errors.IsConflict(err) { | ||
return reconcile.Result{Requeue: true}, nil | ||
} | ||
return reconcile.Result{}, fmt.Errorf("ensuring instance termination, %w", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't seem to handle the NotFound error that we get back from this EnsureTerminated call in the same way as we do elsewhere here
@@ -908,3 +957,13 @@ func ExpectNodeWithNodeClaimDraining(c client.Client, nodeName string) *corev1.N | |||
Expect(node.DeletionTimestamp).ToNot(BeNil()) | |||
return node | |||
} | |||
|
|||
func ExpectNotRequeued(result reconcile.Result) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are two functions that I think could be generally useful and I would consider putting in operatorpkg
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call, I've opened a PR: awslabs/operatorpkg#130
PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Fixes #N/A
Description
Adds status conditions for node drain and volume detachment to improve observability for the individual termination stages. This is a scoped down version of #1837, which takes these changes along with splitting each termination stage into a separate controller. I will continue to work on that refactor, but I'm decoupling to work on higher priority work.
Status Conditions:
Drained
VolumesDetached
Drained
transitions to true.How was this change tested?
make presubmit
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.